Cyclistic Trip Data Analysis

by Mercy F. Nyambura Kariuki

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

Characters and teams

  • Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
  • Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
  • Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
  • Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments.

One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Ask Three questions will guide the future marketing program:

  1. How do annual members and casual riders use Cyclistic bikes differently?
  2. Why would casual riders buy Cyclistic annual memberships?
  3. How can Cyclistic use digital media to influence casual riders to become members?

Prepare You will use Cyclistic’s historical trip data to analyze and identify trends. Download the previous 12 months of Cyclistic trip data here. (Note: The datasets have a different name because Cyclistic is a fictional company. For the purposes of this case study, the datasets are appropriate and will enable you to answer the business questions. The data has been made available by Motivate International Inc. under this license.)

This is public data that you can use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit you from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

Analyze Now that your data is stored appropriately and has been prepared for analysis, start putting it to work.

  1. Import your data.
  2. Make columns consistent and merge them into a single dataframe.
  3. Clean up and add data to prepare for analysis.
  4. Conduct descriptive analysis.
  5. Export a summary file for further analysis.

Stage 1: Data acquisition

Drop start_station_id, end_station_id, start_station_name and end_station_name

Convert started_at and ended_at to datetime data type then split to days, month and year

Find trip duration: ended_at - started_at

Stage 2: Data Wrangling

Drop unnecessary columns

Rename columns

Add new column to calculate the trip duration

Convert start_time and end_time to datetime dtype first

Split the Start time to days and months

Its seen that many rows in some months contained negative values. Such errors happened because the "ending time" is earlier than the "starting time" in their respective rows.

Number of rows containing Negative Values.

Number of rows containing "trip duration" less than "1" minute.

Removing 153546 rows containing negative values & ride length less than 1 minute. Any trips that were below 60 seconds in length are potentially false starts or users trying to re-dock a bike to ensure it was secure.

Stage 3: Exploratory Analysis

Boxplot of column "Trip Duration" to see the distribution of data between Member and Casual Rider.

Let's check if we have negative records of trip duration

Let's see the number of riders past 24hrs

Roughly 22% of rides last over 24 hours.

Stage 4: Analysis

Analyzing the Difference in Number of Rides Between Casual riders and Members.

Number of Rides in Each Month

Average Number of Rides in Each Weekday

Saturday is by far the day with the most rides for casual riders (876976), followed by Saturday The day with the most rides for member riders is Wednesday (8010030) followed by Tuesday (781366).

Average Trip Duration 2020-2022

  • The average ride length of casual riders are more than twice of members.

Monthly Average Trip Duration 2020-2022

Analyzing Difference in Bike Type Usage Between Casual riders and Members

Conclusion

Limitations

Recommendations